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ABSTRACT 

As high-throughput experimental techniques have become com- 
mon in the area of materials research, entirely new types of 
experimental strategies have appeared. The kinds of problems, the 
desired outcomes, and the appropriate patterns are significantly 
different from those associated with conventional experimentation. 
Classical experimental design (design of experiments, DOE) strate- 
gies grew up in a period of slow, laborious, error-prone experi- 
mentation; a modern high-throughput laboratory can test more 
materials in a week than was previously done in a year. The goal 
of this Account is to identify and critically discuss some of the 
strategies that are being developed and used in this new, exciting 
area of research. 



Introduction 

Over the past 10 years, the new research technology called 
"combinatorial chemistry" or "high-throughput screening" 
has seen exponential growth. This technology— a set of 
techniques for creating a multiplicity of compounds and 
then testing them for activity— has been widely adopted 
in the pharmaceutical industry over the past few years. 
Virtually every major drug manufacturer is now using 
these techniques as the cornerstone of its research and 
development program. In the pharmaceutical industry, 
"libraries" of 1000 to 1 000 000 distinct compounds are 
routinely created and tested for biological activity. This is 
now practical because of the convergence of low-cost 
computer systems, reliable robotic systems, sophisticated 
molecular modeling, statistical experimental strategies, 
and database software tools. 

In the last three to five years, this technology has 
expanded to materials design problems outside the drug 
field, 1 Major chemical companies have entered this arena, 
either by themselves or in concert with a company such 
as Symyx, 2 which specializes in new technologies for 
combinatorial materials discovery. Initial work has focused 
on development of robotic sample preparation, reactors, 
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FIGURE 1. GE's experience with high-throughput screening of a 
catalyst system. 

and sensors. Some of this equipment is becoming avail- 
able commercially. With the use of this equipment, we 
have found that astonishing increases in the throughput 
of experimentation are possible (Figure 1). 

As our ability to generate large numbers of experiments 
has accelerated, we have become more conscious of the 
need to plan these experiments effectively. We find that 
the kinds of problems, the desired outcomes, and the 
appropriate strategies are significandy different from those 
associated with conventional experimentation. Classical 
experimental design strategies grew up in a period of slow, 
laborious, error-prone experimentation. The landmark 
designs developed by Fisher 3 were done in agricultural 
research where one experiment per year was the norm. 
Classic industrial design of experiments (DOE) studies 4 are 
usually attempts to determine the main effects and 
interactions of factors in a minimum number of experi- 
ments. These are now almost trivial; the emphasis is on 
the discovery of complex interactions by searching ex- 
tensive chemical spaces. 

Combinatorial Methods in the Scientific 
Landscape 

The role of combinatorial methods in the general scientific 
landscape is one of scouting a wide array of possibilities 
for a low-probability "lead" to commercially interesting 
materials. This implies that the level of detailed scientific 
understanding of that area is relatively low; otherwise, 
more conventional experimentation would be more fruit- 
ful. Figure 2 gives a picture of the fit of combinatorial 
methods in the overall range of scientific strategies. As 
the level of scientific understanding of a problem in- 
creases, the* quality of the equations and mathenyatical 
models we use to represent that understanding also 
increases. Consequendy, the kinds of experiments we 
perform to generate data also change. At the lowest level 
of knowledge, where we only have a first insight into a 
potentially attractive chemical "universe", empirical strat- 
egies such as combinatorial methods are most attractive. 
As the system becomes better known, the number of 
potentially important factors and their ranges will de- 
crease. More conventional strategies such as the widely 
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Candidates for Scaleup 

Most promising materials are 
used in pilot scale experiments. 



FIGURE 3. High-throughput methodologies require a highly struc- 
tored approach to achieve the productivity improvements advertised 
Multistage screening must be integrated with laboratory- and pilot- 
scale testing. v 

used factorial and response surface designs 5 will then 
become more appropriate. 

All of these programs in . combinatorial or high- 
throughput materials development use some form of a 
multiphase strategy (Figure 3). A first-stage screen may 
only test for one or two critical properties which are easily 
and quickly measured on a microscale. This may be 
followed up by a second screen, also on a microscale, to 
test for other key properties or optimize the settings of 
the process parameters. The best materials will be tested 
on a standard laboratory scale where such parameters as 
mass balance are more accurately determined. Finally, a 
very few materials will become candidates for scale-up 
in a pilot facility. All of these efforts occur in the larger 
context of learning about the overall chemical system. The 
information obtained from the combinatorial experiment 



is fed back to the design process in the form of appropriate 
descnptors of the experimental space. These descriptors 
can be used to structure or constrain the space so the 
experimental process converges more quickly. The re- 
sources for all of these phases must be in balance so there 
are no bottlenecks in the testing process. In addition, all 
of the steps in the screening process must be in proper 
balance. In a high-throughput process, you must "analyze 
in a day what you make in a day". 6 

The goals and strategies of combinatorial techniques 
applied to materials development are quite different from 
those of the pharmaceutical arena. Some of these differ- 
ences are given in Table 1. The primary goal of pharma- 
ceutical research is development of a single compound 
that is effective as a drug. The total number of druglike 
molecules is estimated to exceed 10 64 possibilities. This 
leads to a focus of combinatorial drug strategy: "which 
small portion of all accessible compounds should be made 
to have die greatest chance of progressing the drug design 
project?" 7 The most common current strategy is one of 
diversity, selecting a subset of compounds which repre- 
sent the "chemical space" under investigation. This in 
turn, requires metrics that describe the chemical space; 
these are typically derived from properties which can 
easily be calculated from the structure of the compounds 
being studied. 8 

In materials development, the primary goal is discovery 
of systems that meet a number of physical, chemical, and 
structural requirements. These systems may be catalysts, 
polymers, phosphors, electronic materials, pigments, or 
coatings. 9 Such systems are likely to involve several 
molecular species and process variables. An industrially 
interesting materials development problem will typically 
have been the subject of years (or decades!) of conven- 
tional research; in that work, all the primary effects and 
simple interactions of the various parameters of the 
system will have been investigated. If something new is 
to be found in the system, it will be from the synergistic 
effects of three or more parameters working together 
(Figured). 10 The probability of finding such three-way or 
higher interactions is too low for them to be likely to be 
found by conventional means. Only high- throughput 
experimentation will be able to find them. 

Even with high-throughput methodology, however, the 
combinatorial explosion of possibilities represents a daunt- 
ing task. For example, in a relatively simple single-phase 
homogeneous catalyst system, the number of possible 
experiments quickly rises into the millions (Table 2). If 
we add the complications of multiple phases, as would 
occur in a heterogeneous catalyst, the possibilities grow 
even more numerous. 



pharmaceutical 



Table 1. 



• focused on chemical synthesis as primary 

• emphasis on diversity within known metrics 

• experimental space metrics known 

• easy sample evaluation on nanogram level 

• challenge is designing diverse libraries from 

very large numbers (»10 6 ) of molecules 
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materials development 



• synthesis, mixtures, and process variables 

• emphasis on broad coverage and synergy 

• experimental space metrics not known 

• sample evaluation difficult and individual for each system 

• challenge is finding high order synergies of qualitative and 

mixture/process variables 
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FIGURE 4. Highly active ternary catalyst bounded by low-activity 
binaries. 



Table 2. Possible Numbers of Experiments in a 
Representative Situation 





type 


levels 


formulation factors 






primary catalyst 


qualitative 


1 


inorganic cocatalyst 


qualitative 


20 


amount of cocatalyst 


quantitative 


3 


organic ligand 


qualitative 


20 


amount of ligand 


quantitative 


3 


active anion 


qualitative 


10 


amount of anion 


quantitative 


3 


process factors 


quantitative 




reaction time 


3 


reaction temperature 


quantitative 


3 


reaction pressure 


quantitative 


3 


total number of potential 


runs 


2 916 000 



Within this complex area of research, I would suggest 
a few reasonable goals for the experimental strategist. We 
need 

• strategies to address very large, multidimensional 
experimental spaces 

• a taxonomy of the varieties of experimental spaces 

• estimates of what is discoverable,. .and what is not 

• decision rules for deciding when to stop studying a 
space 

• predictive methods for generating fruitful experi- 
ments. 

The focus in this work will be on the first three of these 
points. 

Experimental Strategy in Combinatorial Organic 
Synthesis 

Work in the pharmaceutical industries has led to consid- 
erable discussion of experimental strategy in this area. 11 " 19 
These articles have mostly focused on what can broadly 
be called "diversity strategies", 18 in which 

• structural descriptors are calculated for each com- 
pound in a potential library; 

• similarity coefficients are calculated between com- 
pound pairs; and 

compounds are selected for libraries using cluster- 
based, dissimilarity-based, or partition-based methods. 

These methodologies have often been compared against 
pure random screening. 15 The advantages and disadvan- 
tages of each method are still a subject of active debate. 

The crucial advantage that combinatorial organic syn- 
thesis has over materials development is its focus on single 
compounds as targets. From these target molecules, 
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FIGURE 5. Ternary gradient in 10% steps. 

descriptors can be calculated and used as metrics in a 
quantitative experimental space. This space is quite large, 
ranging from 19 dimensions 16 to thousands, but it is 
possible to generate rational diversity within that space 
using the methods mentioned above. 

Approaches to Experimental Strategy in 
Materials Development 

High-Speed Array Strategies. The properties of functional 
materials such as phosphors, catalysts, and electronic 
components arise from complex interactions of their 
formulation and processing. The development of descrip- 
tors in these areas is in its infancy. For that reason, 
descriptor-based experimental strategies in these areas 
have tended to be limited. "There is no approach that will 
have the generality of the combinatorial methods currently 
used for synthesis and screening of biologically active 
molecules." 20 In fact, many "combinatorial" materials 
development programs are best characterized as array 
methods for rapid performance of conventional experi- 
ments. In the following section, I will discuss some of the 
more common approaches in the literature from an 
experimental strategy viewpoint. 

1. Gradient Arrays. A common approach in solid-state 
materials studies is examination of a ternary (or higher) 
materials gradient. 21 This can be done by using continuous 
or point techniques. In "continuous composition spread", 22 
a single film with a ternary composition spread was 
generated on a 63- x 66-mm substrate in one step, and 
the electronic properties were measured at -4000 points. 
It found an excellent dielectric Zro. 15 Sno.3Ti cssC^-a. The 
experimental design and strategy issues in this type of 
experiment are limited to the choice of experimental 
system and the fineness of the test gradient. This has been 
most developed in the study of electronic thin-film 
materials. It r is particularly suited for identification of 
narrow phase regions with suitable properties. It is de- 
pendent on very fast, high-resolution methods of property 
determination. 

A more common theme is a ternary gradient studied 
at regular intervals. Intervals such as 0-100% by 10% steps 
or 0-1% by 0.1% steps are convenient; these generate 66- 
point triangular arrays (Figure 5). For example: 

• The Pt-Pd-In system for cyclohexane dehydrogena- 
tion catalysis was studied in the 0-1% range at 0.1% 
intervals. 23 
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Table 3. Representative Metal Oxide Materials 
(HosttActivator) 



host metal 
atoms 



super- magneto- 
phosphors conductors resistance 



Fe 2 0 4 :Pd 



1 Y 2 0 3 :Eu 3+ 

I n 3 ^ 1 , 2 ^ 3 ! „ La 2 Cu0 4 LaisTCao aaMnOa 
3 BaMgAli 0 O 17 :Eu 2+ YBa 2 Cu 3 0 7 

. The Rh-Pd-Pt system for CO oxidation was studied 
over the 0-100% range using 15 steps. 24 

The technique can be extended to more complex 
combinations. A quaternary phase diagram was studied 
m the Pt/Ru/Os/Ir system for methanol fuel cell catalysis. 10 
Typically, the response is a visual signal (either directly 
or indirecdy), and the analysis of the data has often been 
done by visual inspection. 

In these designs, the overall shape is determined, but 
there are still important strategy decisions to be made: 
• What grid density should be used? Typically these 
arrays are designed to locate a relatively small region of 
phase space in which a phase with advantageous proper- 
ties is located. The grid density will determine the smallest 
phase space that can be observed. The tradeoff, of course, 
is that halving the distance between levels almost qua- 
druples the number of samples. 

. Should the grid have uniform spacing? In solid-state 
chemistry, it is common for a component to have its most 
important effect as a dopant at very small concentrations 
Uniform spacing may oversample the center of the space 
while missing the potential dopant regions at the edge A 
logarithmic spacing (e.g., 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 
100%) which places more points in the low-concentration 
region may be advantageous. 

• How do we decide whether the quality of an array is 
good enough to make it usable? An advantage of gradient 
arrays (unlike quaternary mask arrays) is that there is a 
direct geometric concentration gradient, so that trends can 
be located visually or with curve-fitting techniques. Quality 
catena must be set for decisions based on lack of trends 
or randomness. 

. How will we detect a "hit" in the array? A "pick-the- 
winner" strategy is the simplest but also most likely to be 
fooled by noise in the data. For screening purposes, entire 
areas of high-response compositions can be selected and 
made into "focussed" arrays. 

2. Quaternary Mask Arrays. These designs were de- 
veloped to exploit the unique features of the inorganic 
chemistry of metal oxides. A large number of scientifically 
and commercially important materials such as phosphors 
scintillators, 25 light-emitting diodes, and superconductors 
are composed of a metal oxide host lattice doped with 
small amounts of other metal atoms as activators (Table 
3). 

These structures can be summarized as A^O,- 
dopant for two host atoms; A^QO^dopant for three, 
and so on. The metal atoms must be chosen for the A B 
and C positions with size and charge appropriate to the 
crystal structure being built. When thin layers of metal 
oxides are deposited onto a substrate and carefully an- 
nealed, the crystal structures form spontaneously. 
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FIGURE 6. Six fractal quaternary masks used in deposition studies. 




» 4 choices for Host A - 
•4 choices for Host B- 
• 4 choices foi Host C - 



Target Structure 



■ 4 choices for Dopant in A 

• 4 choices for Dopant in B - 
1024 total possibilities 



FIGURE 7 The quaternary masking system uses six fractal masks 
each of which can be rotated 90° to allow four choices of materia 
at each level. 

Since there can be many choices for metals in each of 
the A, B, C, and dopant positions, the quaternary mask 
system® (Figure 6) was designed to enable free and flexible 
choices in each position. In its current state of develop- 
ment, 1024 distinct samples can be generated in just 24 
sputtering operations using six masks. Each mask can be 
used four times by 90° rotations. This allows four choices 
for each host position and each dopant (Figure 7). 

The quaternary mask system has important advantages 
over its predecessor, the binary mask system. 27 Binary 
masks do not allow efficient separation of metals by 
function, so a large fraction of the samples made have 
compositions which do not form the correct structure. In 
all these masking systems, an additional degree of com- 
positional freedom can be added by gradually moving the 
mask or a shutter during the deposition procedure. 28 

3. High-Speed Versions of Conventional Experimental 
Designs. These designs are frequently used in the second 
stage of high-throughput screening, when a "hit" has been 
located. Since the cost of experimental points is relatively 
low, these can be quite high resolution designs such as I 
full factorials, central composite designs, 5 and special 
cubic or cubic mixture designs. 29 The classic experimental 
design issue which frequendy crops up in these experi- 
ments is nesting. It is generally quite easy to make an array 
of compositions in one of the standard designs; however, 
these arrays are usually subjected to physical treatments 
(heating, cooling, gas pressure, etc.) as units. The com- 
position variables are therefore nested 30 within the physi- 
cal treatment variables, and appropriate designs and 
analyses must be used. 31-33 
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FIGURE 8. Scaffold for synthesis of enantioselective catalyst using 
representational catalyst strategy. R h Rb and fh are variable 
substituents. 

True Combinatorial Design Strategies, There are two 
possible interpretations of the term "combinatorial" in 
experimental situations. The one used in the pharmaceu- 
tical arena is in the sense of actual combinations of 
compounds in the course of experimentation. This is most 
obvious in the now classic w split-and-pool" technique, in 
which polymer-bound compounds are split into separate 
vessels where each is treated with a different reagent and 
then recombined into a common pool. Repetition of this 
process yields a mixture of m n compounds, where m is 
the number of separate vessels at each step and n is the 
number of steps. This has also been used in some catalyst 
development programs where the catalyst is a single 
organic species— analogous to an enzyme. 34 

A second meaning, which I emphasize here, is the use 
of combinatorial mathematics to calculate and sample the 
possible combinations of parameters in a materials sys- 
tem. If, as noted above, the potential groundbreaking 
material innovations of the next generation will be found 
in high-level synergies, it makes sense to use appropriate 
mathematical tools to locate them. 

The use of these mathematical combinatorial methods 
will be most prevalent in the early stages of a project, 
before the investigators have been able to develop ap- 
propriate descriptors. In such a situation, we only have a 
set of potentially important factors, each of which may 
have many levels. Several strategies have been used so 
far in searching for potential valuable synergies. 

1. "Representational" Strategy. 35 In this approach, a 
molecular catalyst containing three variable substituents 
(Figure 8) was to be optimized. There were 20 possible 
substituents in each region, so the total number of 
possibilities was 20 3 = 8000. Rather than test all 8000, the 
20 possible variations in the: first variable region were 
tested and the best selected. It was then fixed and the 20 
possibilities for the second region were tested, followed 
by fixing the best and testing the third region. This is 
illustrated in Figure 9. 

While this did find a substantially improved catalyst 
using only 60 of 8000 possible experiments, it is also a 
very limited strategy. It is entirely analogous to the "one- 
variable-at-a-time" strategies in conventional experimen- 
tation. 5 If there are any interactions between the three 
variable regions, they will not be found. 

2. Index Library Strategy. 36 This method was used to 
find an optimal metal-ligand catalyst combination. Given 
10 ligands and 10 metals, 100 combinations are possible. 
Instead, the 10 ligands were mixed together, and the 
mixture was tested, one at a time, with each of the metals. 
Similarly, the 10 metals were mixed together, and the 
mixture was tested with each of the ligands; The best 
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FIGURE 9. "Representational" catalyst search strategy. 




with metal mixture 

FIGURE 10. "Index library" catalyst search strategy. 

results from the two sets of experiments then indicated 
the best metal-ligand pair (Figure 10). 

This, too, is a very limited strategy. It can only b£ used 
with relatively small metal-ligand or similar systems. If 
there are too many of either one, the concentration of the 
active species will be diluted to the point where it will 
not appear above the noise. Cross reactions or competi- 
tion in which multiple different ligands bond to a single 
metal may be possible. Finally, these kinds of systems will 
frequently contain catalyst poisons as well as catalysts. 
This will severely impact the usefulness of the method. 

3. All Two-Way Combinations Strategy. 37 This method 
was used to fihd optimal catalyst systems in a situation 
where it had been found that combinations of metal 
cocatalysts were advantageous. Nineteen possible metals 
were identified, and all possible pairs of catalysts were 
tried (Figure 11). Eleven systems were identified as having 
possible synergy and were passed to secondary testing. 

This strategy also has its limitations. Only two-way 
combinations were tested here. If three-way combinations 
were potentially interesting, the number of tests jumped 
to (19 x 18 x 17) /(l x 2 x 3) = 969, which was too many 
for the budget. Also, the metals were only tested at a single 
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Catalyst 
Synergy 




Nineteen metal cocatalysts 

EX? ■*!! combinatorial d esign for two metals. The shading 
in the des.gn ,nd.cates the blocking pattern used in the experiment 

10,000,000 

c 

o 1,000,000 

I 

S 100,000 




10,000 
1,000 



1,000 10,000 100,000 1,000,000 10,000,000 
Total Combinations 

!!? URE I 2 ' 1 ^° tal number of possible ^ee-, four-, and five- 
way combmafions in a five-factor experiment wrth 2-20 levels per 

concentration level. If there had been an effect of relative 
concentrations, this, too, would be missed. Assessing the 
effect of concentrations on each metaJ combination would 
at least triple the number of tests 

Idealized Strategies. If we truly have no information 
on chemical possibilities, then the most effective strategy 
wiU be exhaustive enumeration of all „-way combinations 
of the factors. This leads to the first question: which of 
those combinations can be exhaustively studied in a 
practical experimental situation? 

To investigate this question, we ran a series of simula- 
tions using Crystal Ball 38 software. The following example 
illustrates the results we have found: 
Basic Parameters 

• No chemical knowledge (descriptors) assumed 

• Factors are independent (no nesting) 

• Five factors, each with 2-20 levels 

. Moderately high throughput experimentation; 

hundreds to thousands of runs feasible 

• Experimental error negligible 

Using this simulation, 250-1000 instances were calcu- 
lated. The results are shown in Figure 12 

From Figure 12 we can draw some immediate conclu- 
sions. First, thetotal numbers of possible combinations 
S ™P le a s y ste m ^ this quickly rises into the 
hundreds of thousands or millions. It will not be possible 
to exhaustively test all the possibilities. Second, the 
number of two-way combinations stays in the low thou- 
sands and is therefore experimentally accessible with high- 
throughput technology. Third, while the four-way and five- 
way combinations rapidly increase to the hundreds of 
thousands and are relatively impractical, the three-way 
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combinations, which remain in the low tens of thousand, 
appear accessible. u =><mas, 

^fl 6 ! 1 ! 1 " 1161 encouragement in this area when we 
note that the critical parameter is the number of expert 
mental runs that must be performed, not the absolute 
number of combinations. In a five-factor combinatorial 
experiment, a single experimental run observes 

• 10 two-way combinations 
(12,13,14,15,23,24,25,34,35,45) 

• 10 three-way combinations 

(123, 124, 125, 134, 135, 145, 234, 235, 245, 345) 

• 5 four-way combinations 
(1234> 1235, 1245, 1345, 2345) 

Therefore, the riiinimum required number of runs to 
observe all rc-way combinations is much less than the total 
number of those combinations. In general, the theoretical 
minimum number of runs to observe all «-way combina- 
tions is the product of the number of levels of the n factors 
with the largest numbers of levels. Thus, the minimum 
runs 



(two-way) 



~ A, max ^,max h,m<n (three-way) 



where Z imaj( is the number of levels of the factor with the 
fcrgest number of levels, l } , max is the second largest, etc 
The actual minimum may be slightly larger than the 
theoretical minimum in more irregular chemical spaces 
If we apply this calculation to the numbers of combi- 
nations found above, we discover that the number of runs 
(Figure 13) required to exhaustively study all possible two- 
way combinations is actually relatively small. Even three- 
way combinations become quite tractable. The figure also 
shows that in the three-way case there is a substantial 
advantage in working with experimental spaces with 
relatively equal number of levels. 

Strategies for Observing Two-Way Combinations. 
Two-way combinations are relatively easy to observe, even 
in rather complex systems. The mathematical description 
for an array which exhaustively samples all two-way 
combmations of a set is an "orthogonal array of strength 
2 and index 1' » The classical Latin square design is such 
an array which efficiently samples all two-way combina- 
tions m a symmetricar system, with all factors having the 
same number of levels (Figure 14). 

Latin squares can be generalized to less symmetrical 
systems such as Youden squares, and orthogonal arrays 
of strength 2 are relatively easy to construct 

Strategies for Observing Three-Way Combinations. 
Three-way combinations are less easy to exhaustively 
observe. Although a "Latin cube" is possible for perfecdy 
symmetrical systems, it is not generalizable. Orthogonal 
arrays of strength 3 and index 1, which would be required 
tor these systems, are relatively rare and are quite difficult 
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FIGURE 13. Minimum number of runs necessary to sample all two-way 
per factor. 
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FIGURE 14. A Latin square observes all 64 two-way combinations 
of three factors with four levels each using 16 runs. 

to construct. 40 Therefore, algorithmic approaches are 
required. We have examined three strategies in this area: 

• Random Runs 

• Genetic Algorithms 

• Computer-Generated Test Plans 

1. Random Runs. The use of randomly chosen runs in 
a combinatorial study was investigated using simulation. 
The basic assumptions of the study were 

• No chemical knowledge assumed 

• Independent factors 

• Six factors with 2-6 levels each or 
eight factors with 2-8 levels each 

In each iteration of this simulation, a set of levels for 
each factor was randomly selected, and the list of all 
possible three -factor combinations was generated. Sets of 
10, 20, 40, ... runs were then randomly generated and the 
resulting combinations checked off the list. The results 
of this simulation are given in Figure 15. It shows that 
random runs can be a relatively efficient method of 
sampling the three-way combinations in a fairly complex 
experiment. Approximately 80% of the combinations have 
been sampled by the time the theoretical minimum 
number of runs have been completed. Exhaustive sam- 
pling, however, is less successful; it requires about three 
times the theoretical minimum to sample 99% of the total 
combinations. 
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FIGURE 15. Three-way combinations observed with random runs. 
The results of each simulation are reported relative to the theoretical 
minimum runs for each combination of factors and levels. Each 
simulation was run at a given factor/level combination 10 times; the 
error bars show the range of the data. 

2. Genetic Algorithms. Genetic algorithms 41 (GAs) are 
a popular method of searching for optima in fields varying 
as widely as truck manufacturing and drug design. They 
have the advantage of being assumption free; they will 
work if there is any underlying structure to the experi- 
mental space— even if we cannot figure it out The process 
of experimentation using genetic algorithms is straight- 
forward: 

• Selection of an experimental space consisting of 
compositional and process parameters which are com- 
bined to form a "genetic code" for producing the desired 
materials. 

• Initialization of a first generation of materials. This is 
usually done by random selection, but it can be seeded 
with known "good" runs or constrained by prior knowl- 
edge. 

• Preparation and testing of the materials from the first 
generation. * t 

• Prioritizing the genetic codes from the first generation 
as "parents" for the next generation on the basis of the 
testing responses. 

• Creation of the next generation from those parents 
by applying the evolutionary operators of crossover, 
qualitative mutation, and quantitative mutation. The 
critical design decisions in this methodology bear on the 
tradeoff between the rate of convergence on the best 
material vs the certainty of convergence. This is sum- 
marized in Table 4. 42 The principal disadvantage of GA 



VOL 34, NO. 3, 2001 / ACCOUNTS OF CHEMICAL RESEARCH 219 



Combinatorial and High-Throughput Materials Development Cawse 





" ^Population Shift^y- - 



lTl n, 



BGen 1 
0Gen2 
□ Gen 3 



Catalyst Activity 

FIGURE 16. Three generations of a population of catalysts with 55 
formulations per generation. 

Table 5. Possible States of Automatic Telephone 
Software 



call type 
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access 
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long distance 
international 



status 



caller 
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800 



loop 
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PBX 



success 

busy 

blocked 



Table 6. Nine-Run Design That Tests All 54 Two-Way 
Combinations 



call type 



1 

2 
3 
4 
5 
6 
7 
8 
9 



local 

long distance 
international 
local 

long distance 
international 
local 

long distance 
international 



billing 



access 



status 



collect 

800 

caller 

800 

caller 

collect 

caller 

collect 

800 



PBX 
loop 
ISDN 
ISDN 
PBX 
loop 
loop 
ISDN 
PBX 



busy 
busy 
busy 
blocked 
blocked 
blocked 
success 
success 
success 



strategies is the number of generations required for 
convergence. Most GA optimizations are run on computer 
models, so the cost and time required for running dozens 
to hundreds of generations are low. When a generational 
cycle requires a full process of running and analyzing an 
expenment, the cost may be too high. In a representative 
experiment run for three generations in our laboratory 
(Figure 16), there was a clear population shift toward 
higher activity, but the rate was too slow to be practical' 
In other (bio)chemical systems where GAs have been used 
convergence or leveling of improvement has occurred in 
10-20 generations. 43 44 

3. Computer-Generated Test Plans. With a computer- 
aided algorithm, it is possible to exhaustively enumerate 
all possible w-way combinations. This can be followed by 
selection of an appropriate subset of runs that will sample 
aU n-way combinations. Fortunately, this problem has 
already been solved in another context-software test 
generation. Proper testing of software requires examina- 
tion of combinations of inputs to test for untoward 
interactions. For example, an automatic telephone system 
might require examination of the possibilities shown in 
Table 5. There are 81 possible scenarios in this situation, 
which contain 54 possible two-way combinations. All of 
these combinations can be sampled in only nine experi- 
mental runs (Table 6). 

220 ACCOUNTS OF CHEMICAL RESEARCH / VOL 34. NO. 3, 2001 



A Web-based software service 45 has now been com- 
mercialized to generate such test plans. We have found it 
to be reasonably user-friendly and capable of accom- 
modating complex experiments and constraints. For 
example, a catalyst system consisting of 

• Primary catalyst: 4 possibilities 

• Metal cocatalyst: 2x5 possibilities @ 

2 concentrations 

• Cocatalyst ligand: 6 possibilities 

• Nonmetal cocatalyst: 3 possibilities @ 

3 concentrations 

• Process factors: 3 @2 levels 

contained 6075 possible three-way combinations. The 
theoretical minimum number of runs to sample all three- 
way combinations is 150; the algorithm was able to find 
a 167-run plan that actually sampled them all. 

The principal limitations of these test plans are the 
following: 

• They are highly dependent on the significant interac- 
tion effects being synergistic rather than antagonistic 
Even a modest poisoning effect can obliterate a large 
portion of the design. 

• They require that the desired high-order interaction 
effect be relatively large, while the main effects and low- 
order interactions remain small. Otherwise, the desired 
observation will be drowned in the noise of the additive 
lower-order effects. 

• The lack of redundancy requires that the quality of 
the experimental system be very high. 

If these assumptions are met, a simple histogram or 
normal probability plot of the response data will identify 
the runs containing strongly positive interactions. If there 
is more than one such run, the active factors will be 
indicated by simple comparison. If there is only one a 
followup design can be run with only two levels/factor to 
home m on the active factors/A resolution IV fractional 
factorial design (32 tuns in the catalyst case above) will I 
cover all the possible three-way combinations. 46 ' 

Conclusion 

These methods of high-throughput materials development 
are still in a rapid state of development, and experimental 
strategies appropriate to each methodology are also 
appearing rapidly. This Account does not delve into the 
full complexities of statistical analysis which may be 
required for some of these approaches; it is a very good 
idea to have an experienced statistician as a full member 
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of the team. Finally, the quality issues inherent in opera- 
tion of an automated, high-throughput experimental 
system are substantial and will be discussed in a subse- 
quent article. 

The author thanks the GE Corporate R&D Combinatorial 
Chemistry program, led by Terry Leib, and the Applied Statistics 
Program, led by Gerald Hahn, for their contributions which helped 
make this review possible. Special thanks go to Charlie Hendrix 
(South Charleston, WV) for many fruitful discussions on the 
strategy of experimentation. 
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